Refactor: refine tensor dependency tracking#415
Merged
poursoul merged 1 commit intohw-native-sys:mainfrom Apr 1, 2026
Merged
Conversation
Collaborator
jvjhfhg
commented
Mar 31, 2026
- Add OUTPUT_EXISTING and NO_DEP handling for existing tensors, and split creator retention from overlap-based writer lookup.
- Store owner_task_id in Tensor, remove the CreatorMap path, and update affected orchestration examples to use the refined dependency semantics.
c4a6952 to
19d6c42
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the dependency tracking system by introducing creator-based tracking via a new owner_task_id field in the Tensor structure, complementing the existing OverlapMap lookups. It adds new TensorArgType variants, OUTPUT_EXISTING and NO_DEP, to better handle different buffer lifecycles and reduces complexity in the TensorMap by removing the with_alloc flag. Feedback focuses on critical safety issues where the fanin_count limit is silently enforced, which could lead to dropped dependencies and data races. Additionally, it is recommended that OUTPUT_EXISTING perform full OverlapMap lookups to maintain correctness and prevent stale entries in the TensorMap.
19d6c42 to
1dbc46e
Compare
poursoul
reviewed
Mar 31, 2026
a37a79f to
f11a30c
Compare
f11a30c to
61a5e0a
Compare
poursoul
previously approved these changes
Apr 1, 2026
61a5e0a to
b9220d3
Compare
b9220d3 to
2006f05
Compare
Add OUTPUT_EXISTING and NO_DEP handling for existing tensors, and split creator retention from overlap-based writer lookup. Store owner_task_id in Tensor, remove the CreatorMap path, and update affected orchestration examples to use the refined dependency semantics.
2006f05 to
a05f801
Compare
poursoul
approved these changes
Apr 1, 2026
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 1, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a (hw-native-sys#419): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 (hw-native-sys#403): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a (hw-native-sys#419): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff (hw-native-sys#389): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 (hw-native-sys#390): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 (hw-native-sys#395): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 (hw-native-sys#387): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 (hw-native-sys#403): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 (hw-native-sys#404): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 (hw-native-sys#415): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c (hw-native-sys#417): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 (hw-native-sys#392): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 (hw-native-sys#387): Use from_u64<float> in softmax_prepare kernels - fe63325 (hw-native-sys#403): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 (hw-native-sys#404): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c (hw-native-sys#417): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 1, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 1, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 1, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 1, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 2, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 2, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoZheng109
added a commit
to ChaoZheng109/simpler
that referenced
this pull request
Apr 2, 2026
…ndency tracking Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[[[[hw-native-sys#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[[[[hw-native-sys#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[[[[hw-native-sys#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[[[[hw-native-sys#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[[[[hw-native-sys#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[[[[hw-native-sys#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[[[[hw-native-sys#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[[[[hw-native-sys#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[[[[hw-native-sys#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[[[[hw-native-sys#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
ChaoWao
pushed a commit
that referenced
this pull request
Apr 2, 2026
…ndency tracking (#426) Synchronize A5 platform, runtimes, and tests with a2a3 improvements. Follows the established sync pattern. Platform (src/a5/platform/): - 012675a ([[[[[[#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add PerfSetDeviceCallback for device context setup in mgmt_loop, buffer recycling improvements - fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Fix include paths in perf_profiling.h, rename params_cycle to args_cycle Runtime host_build_graph (src/a5/runtime/host_build_graph/): - 012675a ([[[[[[#419](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)](https://github.com/ChaoZheng109/simpler/issues/419)): Add implicit task profiling records in Case 1/Case 2, reinterpret_cast cleanup, license header Runtime tensormap_and_ringbuffer (src/a5/runtime/tensormap_and_ringbuffer/): - 7059fff ([[[[[[#389](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)](https://github.com/ChaoZheng109/simpler/issues/389)): Encapsulate TaskOutputTensors materialization in orchestrator - 27a85c8 ([[[[[[#390](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)](https://github.com/ChaoZheng109/simpler/issues/390)): Const-qualify set_tensor_data Tensor parameter - 1d97ac5 ([[[[[[#395](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)](https://github.com/ChaoZheng109/simpler/issues/395)): Arg inherits TaskArgs, simplify orchestrator arg passing - 121a1d5 ([[[[[[#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Add to_u64/from_u64 type-safe conversion utilities in orchestration API - fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Defer output tensor materialization, TensorCreateInfo pointer, tensormap link_entry - cd59b47 ([[[[[[#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add SPMD context accessors, intrinsic.h, build_payload with LocalContext/GlobalContext - 4917d12 ([[[[[[#415](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)](https://github.com/ChaoZheng109/simpler/issues/415)): Refine tensor dependency tracking, TensorCreateInfo alignment, owner tracking - 34a6e1c ([[[[[[#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): SPMD multi-block dispatch, scheduler dual-queue, submit_types extensions Tests (examples/a5/, tests/st/a5/): - be765f1 ([[[[[[#392](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)](https://github.com/ChaoZheng109/simpler/issues/392)): Migrate paged_attention orchestration to ChipStorageTaskArgs API - 121a1d5 ([[[[[[#387](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)](https://github.com/ChaoZheng109/simpler/issues/387)): Use from_u64<float> in softmax_prepare kernels - fe63325 ([[[[[[#403](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)](https://github.com/ChaoZheng109/simpler/issues/403)): Add license headers, NOLINT annotations, output tensor view(true) - cd59b47 ([[[[[[#404](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)](https://github.com/ChaoZheng109/simpler/issues/404)): Add spmd_basic example (AIC+AIV SPMD read test) - 34a6e1c ([[[[[[#417](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)](https://github.com/ChaoZheng109/simpler/issues/417)): Add spmd_multiblock_aiv and spmd_multiblock_mix examples - Remove redundant end-of-kernel sync barriers in paged_attention test kernels - Adjust paged_attention orch_thread_num (2→1), paged_attention_unroll block_dim (36→24)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.